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Abstract 

This paper presents an average case denoising performance analysis for the Subspace Pursuit (SP), 
the CoSaMP and the IHT algorithms. This analysis considers the recovery of a noisy signal, with the 
, ^ assumptions that (i) it is corrupted by an additive random white Gaussian noise; and (ii) it has a K-sparse 

On I representation with respect to a known dictionary D. The proposed analysis is based on the Restricted- 

m 

ly^ . Isometry-Property (RIP), establishing a near-oracle performance guarantee for each of these algorithms. 



The results for the three algorithms differ in the bounds' constants and in the cardinality requirement 
(^ ' (the upper bound on K for which the claim is true). 

Similar RIP-based analysis was carried out previously for the Dantzig Selector (DS) and the Basis 
Pursuit (BP). Past work also considered a mutual-coherence-based analysis of the denoising performance 
j^ ' of the DS, BP, the Orthogonal Matching Pursuit (OMP) and the thresholding algorithms. This work differs 

CZ ' from the above as it addresses a different set of algorithms. Also, despite the fact that SP, CoSaMP, and 

IHT are greedy-like methods, the performance guarantees developed in this work resemble those obtained 
for the relaxation-based methods (DS and BP), suggesting that the performance is independent of the 
sparse representation entries contrast and magnitude. 



I. Introduction 

A. General — Pursuit Methods for Denoising 

The area of sparse approximation (and compressed sensing as one prominent manifestation of its 
applicability) is an emerging field that get much attention in the last decade. In one of the most basic 
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problems posed in this field, we consider a noisy measurement vector y G M™ of the form 

y = Dx + e, (I.l) 

where x G R^ is the signal's representation with respect to the dictionary D G ]^"t-xN ^^^q^q N > m. 
The vector e G M™ is an additive noise, which is assumed to be an adversial disturbance, or a random 
vector - e.g., white Gaussian noise with zero mean and variance o"^. We further assume that the columns 
of D are normalized, and that the representation vector x is ET-sparse, or nearly soj^ Our goal is to find 
the K-sparse vector x that approximates the measured signal y. Put formally, this reads 

min ||y — Dx||2 subject to ||x||q = ET, (1.2) 

where ||x||q is the io pseudo-norm that counts the number of non-zeros in the vector x. This problem 
is quite hard and problematic m, lH, 131, H. A straight forward search for the solution of (II.2b is 
an NP hard problem as it requires a combinatorial search over the support of x [5|. For this reason, 
approximation algorithms were proposed - these are often referred to as pursuit algorithms. 

One popular pursuit approach is based on ii relaxation and known as the Basis Pursuit (BP) 161 or 
the Lasso Q. The BP aims at minimizing the relaxed objective 

(PI): minx||x||-|^ subject to ||y — Dx||2 < e^p, (1.3) 

where ebp is a constant related to the noise power. This minimizing problem has an equivalent form: 

{BP): miiix -||y-Dx||^ + 7Bp||x||i, (1.4) 

where jbp is a constant related to cbp- Another ^i-based relaxed algorithm is the Dantzig Selector (DS), 
as proposed in HI. The DS aims at minimizing 

(DS) : min ||x||i subject to ||D*(y - Dx)||^ < eos, (1-5) 

where eos is a constant related to the noise power. 

A different pursuit approach towards the approximation of the solution of ( II.2I ) is the greedy strategy 
O, ITOl . ITTil . leading to algorithms such as the Matching Pursuit (MP) and the Orthogonal Matching 
Pursuit (OMP). These algorithms build the solution x one non-zero entry at a time, while greedily aiming 
to reduce the residual error y — Dx L. 

'a more exact definition of nearly sparse vectors will be given later on 
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The last family of pursuit methods we mention here are greedy -like algorithms that differ from MP and 
OMP in two important ways: (i) Rather than accumulating the desired solution one element at a time, a 
group of non-zeros is identified together; and (ii) As opposed to the MP and OMP, these algorithms enable 
removal of elements from the detected support. Algorithms belonging to this group are the Regularized 
OMP (ROMP) 113, the Compressive Sampling Matching Pursuit (CoSaMP) |13|, the Subspace-Pursuit 
(SP) 1 14], and the Iterative Hard Thresholding (IHT) [|15il . This paper focuses on this specific family of 
methods, as it poses an interesting compromise between the simplicity of the greedy methods and the 
strong abilities of the relaxed algorithms. 

B. Performance Analysis — Basic Tools 

Recall that we aim at recovering the (deterministic!) sparse representation vector x. We measure the 
quality of the approximate solution x by the Mean-Squared-Error (MSE) 

MSE(x) = £;||x-x||2, (1.6) 

where the expectation is taken over the distribution of the noise. Therefore, our goal is to get as small 
as possible error. The question is, how small can this noise be? In order to answer this question, we 
first define two features that characterize the dictionary D - the mutual coherence and the Restricted 
Isometry Property (RIP). Both are used extensively in formulating the performance guarantees of the sort 
developed in this paper. 

The mutual-coherence /i |[T6l . ifTTl . |[T8l of a matrix D is the largest absolute normalized inner product 
between different columns from D. The larger it is, the more problematic the dictionary is, because in 
such a case we get that columns in D are too much alike. 

Turning to the RIP, it is said that D satisfies the A'-RIP condition with parameter 6k if it is the smallest 
value that satisfies 

(1 - 5k) ||x||^ < ||Dx||^ < (1 + Sk) \\x\\l (1-7) 

for any K -sparse vector x |[T9l . |[20ll . 

These two measures are related by 6k < {K — l)/i f^P]. The RIP is a stronger descriptor of D as 
it characterizes groups of K columns from D, whereas the mutual coherence "sees" only pairs. On the 
other hand, computing /i is easy, while the evaluation of 6k is prohibitive in most cases. An exception to 
this are random matrices D for which the RIP constant is known (with high probability). For example. 
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if the entries of ^/mD are drawn from a white Gaussian distributior|3 and m > C K log{N / K) / e^ , then 
with a very high probability Sr^ < e |[T9l. ll22l. 

We return now to the question we posed above: how small can the error MSE(x) be? Consider an 
oracle estimator that knows the support of x, i.e. the locations of the K non-zeros in this vector. The 
oracle estimator obtained as a direct solution of the problem posed in (11.21 ) is easily given by 



^oracle = ^tY, (1-8) 

where T is the support of x and D^ is a sub-matrix of D that contains only the columns involved in 
the support T. Its MSE is given by O 



MSE{-korade) — E \\:>i. — ^orade\\2 — ^ 



D^^e 



2 

(1.9) 

2 



In the case of a random noise, as described above, this error becomes 



MSE(Xorade) = E 



D^e 



(1. 10) 

2 



trace {(D5.Dt)"^}ct 



2 



K 2 

^ T^" ■ 

This is the smallest possible error, and it is proportional to the number of non-zeros K multiplied by 
(7^. It is natural to ask how close do we get to this best error by practical pursuit methods that do not 
assume the knowledge of the support. This brings us to the next sub-section. 

C. Performance Analysis — Known Results 

There are various attempts to bound the MSE of pursuit algorithms. Early works considered the 
adversary case, where the noise can admit any form as long as its norm is bounded ||23l . Il24l . El . 
Q. These works gave bounds on the reconstruction error in the form of a constant factor (Const > 1) 
multiplying the noise power, 

||x — x||2 < Const • ||e||2 . (I-ll) 

Notice that the cardinality of the representation plays no role in this bound, and all the noise energy is 
manifested in the final error. 

^The multiplication by ^/rn comes to normalize the columns of the effective dictionary D. 
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One such example is the work by Candes and Tao, reported in |20], which analyzed the BP error. This 
work have shown that if the dictionary D satisfies 5k + 52K + 5^K < 1 then the BP MSE is bounded 
by a constant times the energy of the noise, as shown above. The condition on the RIP was improved 
to 62K < \/2 — 1 in 1251 . Similar tighter bounds are 6i,q25K < V^ — 1 and S^k < 4 — 2\/3 |l26l|, or 
6k < 0.307 ll27l . The advantage of using the RIP in the way described above is that it gives a uniform 
guarantee: it is related only to the dictionary and sparsity level. 

Next in line to be analyzed are the greedy methods (MP, OMP, Thr) |l23l, Q- UnUke the BP, 
these algorithms where shown to be more sensitive, incapable of providing a uniform guarantee for 
the reconstruction. Rather, beyond the dependence on the properties of D and the sparsity level, the 
guarantees obtained depend also on the ratio between the noise power and the absolute values of the 
signal representation entries. 

Interestingly, the greedy-like approach, as practiced in the ROMP, the CoSaMP, the SP, and the IHT 
algorithms, was found to be closer is spirit to the BP, all leading to uniform guarantees on the bounded 
MSE. The ROMP was the first of these algorithms to be analyzed |[T2l . leading to the more strict 
requirement 52k < 0.03/VlogK. The CoSaMP US and the SP lEl that came later have similar RIP 
conditions without the log K factor, where the SP result is slightly better. The IHT algorithm was also 
shown to have a uniform guarantee for bounded error of the same flavor as shown above lITSl . 

All the results mentioned above deal with an adversial noise, and therefore give bounds that are related 
only to the noise power with a coefficient that is larger than 1, implying that no effective denoising is to 
be expected. This is natural since we consider the worst case results, where the noise can be concentrated 
in the places of the non-zero elements of the sparse vector. To obtain better results, one must change the 
perspective and consider a random noise drawn from a certain distribution. 

The first to realize this and exploit this alternative point of view were Candes and Tao in the work 
reported in |i8| that analyzed the DS algorithm. As mentioned above, the noise was assumed to be random 
zero-mean white Gaussian noise with a known variance o"^. For the choice e^s = ■\/2(l + a) logN ■ a, 
and requiring 62K + ^SK < 1> the minimizer of dl.SI ). x/)5, was shown to obey 

||x - 6cDs\\l < Cls ■ (2(1 + a) logN) ■ Ka\ (1.12) 

with probabiUty exceeding 1 - {^/tt{1 + a)\ogN ■ N"-)'^, where Cos = 4/(1 - 2(53/^)0 Up to a 
constant and a log N factor, this bound is the same as the oracle's one in ( II.9I ). The log N factor in (11.121 ) 

^In (H) a slightly different constant was presented. 
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in unavoidable, as proven in II28I . and therefore this bound is optimal up to a constant factor. 

A similar result was presented in |[29l for the BP, showing that the solution of (11.41) for the choice 

7bp = ^8cr2(l + a)log A^, and requiring 52k + ^^ZK < 1> satisfies 

||x - ±Ds\\l < CIp ■ (2(1 + a) logiV) • Ka'' (1.13) 

with probability exceeding 1 — (A^")~^. This result is weaker than the one obtained for the DS in three 
ways: (i) It gives a smaller probability of success; (ii) The constant Cbp is larger, as shown in [21] 
{Cbp > 32/k^, where k < 1 is defined in [29]); and (iii) The condition on the RIP is stronger. 

Mutual-Coherence based results for the DS and BP were derived in ll30l . Il2ll . In ||2T1| results were 
developed also for greedy algorithms - the OMP and the thresholding. These results rely on the contrast 
and magnitude of the entries of x. Denoting by ^greedy the reconstruction result of the thresholding and 
the OMP, we have 

||X - i^greedyWl ^ <^greedy ' (2(1 + a) log N) ■ Ka\ (1.14) 



where C greedy < 2 and with probability exceeds 1 — {^jT^{l + a)\ogN ■ N'^) ^. This result is true for 
the OMP and thresholding under the condition 



^min\ - 2a^2{l + a)logN ^ I |x™„| OMP 



{2K-1)^ [ |x„,,| THR 

where |xmm| and |xmaa| are the minimal and maximal non-zero absolute entries in x. 

D. This Paper Contribution 

We have seen that greedy algorithms' success is dependent on the magnitude of the entries of x and the 
noise power, which is not the case for the DS and BP. It seems that there is a need for pursuit algorithms 
that, on one hand, will enjoy the simplicity and ease of implementation as in the greedy methods, while 
being guaranteed to perform as well as the BP and DS. Could the greedy-like methods (ROMP, CoSaMP, 
SP, IHT) serve this purpose? The answer was shown to be positive for the adversial noise assumption, 
but these results are too weak, as they do not show the true denoising effect that such algorithm may 
lead to. In this work we show that the answer remains positive for the random noise assumption. 

More specifically, in this paper we present RIP-based near-oracle performance guarantees for the SP, 
CoSaMP and IHT algorithms (in this order). We show that these algorithms get uniform guarantees, just 
as for the relaxation based methods (the DS and BP). We present the analysis that leads to these results 
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and we provide explicit values for the constants in the obtained bounds. 

The organization of this paper is as follows: In Section |ll] we introduce the notation and propositions 
used for our analysis. In Section |lll] we develop RIP-based bounds for the SP, CoSaMP and the IHT 
algorithms for the adversial case. Then we show how we can derive from these a new set of guarantees 
for near oracle performance that consider the noise as random. We develop fully the steps for the SP, 
and outline the steps needed to get the results for the CoSaMP and IHT. In Section |IV] we present some 
experiments that show the performance of the three methods, and a comparison between the theoretical 
bounds and the empirical performance. In Section |V] we consider the nearly-sparse case, extending all 
the above results. Section |Vl] concludes our work. 

II. Notation and Preliminaries 
The following notations are used in this paper: 

• supp(x) is the support of x (a set with the locations of the non-zero elements of x). 

• |supp(x)| is the size of the set supp(x). 

• supp(x, K) is the support of the largest K magnitude elements in x. 

• Dy is a matrix composed of the columns of the matrix D of the set T. 

• In a similar way, xy is a vector composed of the entries of the vector x over the set T. 

• T"-^ symbolizes the complementary set of T. 

• T — T is the set of all the elements contained in T but not in T. 

• We will denote by T the set of the non-zero places of the original signal x; As such, |r| < K when 
X is /f -sparse. 

• XX is the vector with the K dominant elements of x. 

• The projection of a vector y to the subspace spanned by the columns of the matrix A (assumed to 
have more rows than columns) is denoted by proj(y, A) = AA^^y. The residual is resid(y, A) = 
y - AAty. 

• Te is the subset of columns of size i^ in D that gives the maximum correlation with the noise vector 
e, namely, 

Tg = argmax ||D^e||2 (II. 1) 

T\ \T\=K 



Te^p is a generalization of Tg where T in (III. 1 b is of size pK, p G N. It is clear that 



n* 



< 

2 
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The proofs in this paper use several propositions from |[T3l . |[T4l . We bring these in this Section, so as 
to keep the discussion complete. 

Proposition 2.1: [Proposition 3.1 in |13|] Suppose D has a restricted isometry constant 6k- Let T be 
a set of K indices or fewer. Then 



|D^y||2< V^ + ^kWy 



D:^y 



2 - \/ ^ ^ "K \\y\\2 

1 „ „ 

< yL 

||D^DTX||2 = (1±(5^)||X||2 

where the last two statements contain upper and lower bounds, depending on the sign chosen. 
Proposition 2.2: [Lemma 1 in |[T4l l Consequences of the RIP: 

1) (Monotonicity of 6k) For any two integers K < K', 6k < 6k'- 

2) (Near-orthogonality of columns) Let I,Jc {1, ■■■,N} be two disjoint sets (/ n J = 0). Suppose 
that (^i/i+iJi < 1- For arbitrary vectors a G mI^I and b G rI-^I, 

|(D/a,Djb)| < (^|/|+|j| ||a||2l|b||2 

and 

||D}Djb||2 <6\i\+\j\ ||b||2. 

Proposition 2.3: [Lemma 2 in |[T4i l Projection and Residue: 

1) (Orthogonality of the residue) For an arbitrary vector y G R™ and a sub-matrix D/ G R™^^ of 
full column-rank, let y^ = resid(y, Dj). Then D^y,. = 0. 

2) (Approximation of the projection residue) Consider a matrix D G M"*^^. Let I, J C {1, ..., N} be 
two disjoint sets, / n J = 0, and suppose that (^i/i+iJi < 1. Let y G span(D/), y^ = proj(y,Dj) 
and Yr = resid(y, Dj). Then 

II II ^ Vl+kl II II 

iiypii2 — 1 r ii.yii2 

and 

Vl+I^l 



1 - ^ma.x{\I\,\J\) 



lyllo < llyrlU < lly|l 
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Proposition 2.4: [Corollary 3.3 in |[T3l l Suppose that D has an RIP constant 5^^. Let Ti be an arbitrary 
set of indices, and let x be a vector. Provided that K > |Ti U supp(x)|, we obtain that 

D^ Drcxrc L < 5^ Xt^c L . (11.2) 

II -'i -^1 -^1 112 — -KM -t 1 112 ^ ^ 

III. Near oracle performance of the algorithms 

Our goal in this section is to find error bounds for the SP, CoSaMP and IHT reconstructions given the 
measurement from (11.11 ). We will first find bounds for the case where e is an adversial noise using the 
same techniques used in llT4l . |[T3l . In these works and in ifTSl . the reconstruction error was bounded by 
a constant times the noise power in the same form as in dl.lll) . In this work, we will derive a bound that 
is a constant times ||D^ e|| (where Tg is as defined in the previous section). Armed with this bound, we 
will change perspective and look at the case where e is a white Gaussian noise, and derive a near-oracle 
performance result of the same form as in ( II.12I ). using the same tools used in fSl. 

A. Near oracle performance of the SP algorithm 

We begin with the SP pursuit method, as described in Algorithm [1] SP holds a temporal solution with 
K non-zero entries, and in each iteration it adds an additional set of K candidate non-zeros that are most 
correlated with the residual, and prunes this list back to K elements by choosing the dominant ones. We 
use a constant number of iterations as a stopping criterion but different stopping criteria can be sought, 
as presented in |[T4l . 

Algorithm 1 Subspace Pursuit Algorithm [Algorithm 1 in fT4l] 

Input: K,T),y where y = Dx + e, K is the cardinality of x and e is the additive noise. 
Output: ii-sp- i^-sparse approximation of x 
Initialize the support: T^ = 0. 
Initialize the residual: y^ = y. 
while halting criterion is not satisfied do 

Find new support elements: Ta = snpp{D*yl~^ , K). 
Update the support: T^ = T^^^ U Ta. 
Compute the representation: Xp = Dt y. 
Prune small entries in the representation: T^ = supp{xp,K). 
Update the residual: yf. = resid(y, D-r^). 
end while 
Form the final solution: xsp^(^rpt-jc = and ^sp,t^ = D-r^y- 
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6 - 64,K + 4i|^. 
(1 - &A-)' 








0.5 






2(S;iA-(l + &A-) 












0.139 





Fig. 1. The coefficients in Jill. lb and dlll.St as functions of Szk- 



Theorem 3.1: The SP solution at the £-th iteration satisfies the recurrence inequality 



(1 - 6sk)^ 
6 - 663K + 4.5g^ 
(1 - <53i^)3 



D* -., I 
I -'^ e 



For SsK < 0.139 this leads to 



(III.l) 



|xy_2^«||2 < 0.5 ||xy_7^«-i||2 + 8.22 ||Dy^e|| • 



(111.2) 



Proof: The proof of the inequality in (IIII. 1 b is given in Appendix [A] Note that the recursive formula 
given dm. 1 b has two coefficients, both functions of S^k- Fig- [D shows these coefficients as a function 
of 63K- As can be seen, under the condition d^x < 0.139, it holds that the coefficient multiplying 
||x7-_7-(!-i II2 is lesser or equal to 0.5, while the coefficient multiplying ||D^^e|| is lesser or equal to 
8.22, which completes our proof. D 



Corollary 3.2: Under the condition 63K < 0.139, the SP algorithm satisfies 



|xr-r42 ^ 2"^ ||x||2 + 2 • 8.22 ||D5.^e| 



(III.3) 



In addition. After at most 



log2 



X 






(111.4) 
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iterations, the solution x^p leads to an accuracy 



|x-X5p||2 < Csp ||D^^e|| 



(III.5) 



where 



CsP = 2.'-''',^^''h-''^<21Al 



(1 - 53k)^ 
Proof: Starting with (IIII.2I) . and applying it recursively we obtain 



(III.6) 



|xr_T«IU < 0.5 ||xT"_r<!-i|U + 8.22 IID^ elL 

\ -I- J- \\ Z \\ -L -L \\ z II J.e||^ 

< 0.5^ ||xr-T«-2||2 + 8.22 • (0.5 + 1) llD^^^eH^ 

< • • • 

A-i \ 

<0.5'''||xr_T.-fc 112 + 8.22- J]]0.5-' ||D^^e| 



(ni.7) 



Setting k = I leads easily to (IIII.3I ). since ||xt_to||2 = ||xt||2 = ||x 
Plugging the number of iterations l* as in (IIII.4I I to (IIII.3I ) yield^ 



|X'p_'p** II 2 



NI2 + 2 (i_ 53^)3 l|DT.e||2 



<|l + 2 (1_53^)3 )FT.e||2. 



A rpi 



(III.8) 



We define T = T and bound the reconstruction error ||x — X5PII2. First, notice that ||x|| = ||x^|| + 



^T-T 



, simply because the true support T can be divided intqj T and the complementary part, T — T. 



Using the facts that x^p = D - y , y = D^-xj' + e, and the triangle inequality, we get 



|X-X5p||2 



< 



< 



Xf - 


-D^y 


+ 

2 


^T~i 


2 






-f- 


-D^(DTXT + e) 


+ 

2 


Xy_^ 


2 


Xf - 


-dJ^d 


TXT 


+ 

2 


Die 

T 


+ 
2 


X^_^ 



(111.9) 



''Note that we have replaced the constant 8.22 with the equivalent expression that depends on 5'j,k - see JIII.U . 
^The vector Xj, is of length ITI = A" and it contains zeros in locations that are outside T. 
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We proceed by breaking the term Dyxy into the sum Y)j,^f:Kj,^rp + Y)rp_f:x.rp_f, and obtain 



|X-X5P||2 < 



X - — ni^ D -X 



(III. 10) 



+ 



7' T— T'^^T— T 



+ 



(Dp^)-iD^e 



+ 



X 



T-T 



The first term in the above inequality vanishes, since T)rp^f:Krp^f = D^x^ (recall that x^ outside 
the support T has zero entries that do not contribute to the multiplication). Thus, we get that x^ — 
D'-D rp^fXrp^f = Xf — Dt D^x^ = 0. The second term can be bounded using Propositions 12.11 and 12.21 



»t 



DlD^ ^x 
< 



f-^T-T T-T 
1 



I -6k 



(D^D^)-1D^D^_^x^_^ 
62K 



XJ rj^XJ qp _^rj^J^qp _^rj^ 



< 

2 1-5k 



^T-T 



Similarly, the third term is bounded using Propositions 12. 1[ and we obtain 



|X-X5P||2 < 1 + 



< 



62K 

1-5k 



1-5: 



3K 



X 



T-T 



^T-T 



+ 



1 



2 I — Sk 
1 



B*^e 



2 1-6: 



3K 



D^e 



where we have replaced 6k and 62K with 63K, thereby bounding the existing expression from above. 
Plugging (IIII.8I ) into this inequality leads to 



|X-X5p||2 



< 



1 



1-6: 



3K 



2 + 2- 



6 - 663K + 4(5|^ 
(1 - 6sKr 



B*^e 



2- 



7 -963K + 761^-61^ 
(1 - 63k)^ 



T>*fe 



Applying the condition 6sk < 0.139 on this equation leads to the result in (1111.5b . 



D 



For practical use we may suggest a simpler term for £* . Since 1 1 D^ e 1 1 is defined by the subset that 
gives the maximal correlation with the noise, and it appears in the denominator of £*, it can be replaced 
with the average correlation, thus I* ^ log2 ( ||x||2 /\^aj . 

Now that we have a bound for the SP algorithm for the adversial case, we proceed and consider a 
bound for the random noise case, which will lead to a near-oracle performance guarantee for the SP 
algorithm. 

Theorem 3.3: Assume that e is a white Gaussian noise vector with variance a'^ and that the columns of 
D are normalized. If the condition 63K < 0.139 holds, then with probability exceeding 1— (^7r(l + a) log A^- 
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AT")-! we obtain 

||x-xsp||2 <Csp-{2{l + a)logN)-Ka^. (III.ll) 

Proof: Following Section 3 in (H it holds true that P Tsup^ |D*e| > a ■ ^2(1 + a) log N^ < 1 - 
{'\/it{1 + a) logN • N"-)~^. Combining this with dlll.SI ). and bearing in mind that |Te| = K, we get the 
stated result. D 

As can be seen, this result is similar to the one posed in lH for the Dantzig-Selector, but with a 
different constant - the one corresponding to DS is ?» 5.5 for the RIP requirement used for the SP. For 
both algorithms, smaller values of 63K provide smaller constants. 

B. Near oracle performance of the CoSaMP algorithm 

We continue with the CoSaMP pursuit method, as described in Algorithm |2l CoSaMP, in a similar way 
to the SP, holds a temporal solution with K non-zero entries, with the difference that in each iteration 
it adds an additional set of 2K (instead of K) candidate non-zeros that are most correlated with the 
residual. Anther difference is that after the punning step in SP we use a matrix inversion in order to 
calculate a new projection for the K dominant elements, while in the CoSaMP we just take the biggest 
K elements. Here also, we use a constant number of iterations as a stopping criterion while different 
stopping criteria can be sought, as presented in flBl. 

Algorithm 2 CoSaMP Algorithm [Algorithm 2.1 in ||13|] 

Input: K, D, y where y = Dx + e, K is, the cardinality of x and e is the additive noise. 
Output: ^CoSaMP'- i^-sparse approximation of x 
Initialize the support: T" = 0. 
Initialize the residual: y^ = y. 
while halting criterion is not satisfied do 

Find new support elements: Ta = supp(D*y^~^, 2K). 
Update the support: f^ = T^^^ U Ta. 
Compute the representation: Xp = D^ y. 
Prune small entries in the representation: T^ = supp(xp,ii'). 
Update the residual: yf = y — Dj'«(xp)7^«. 
end while 
Form the final solution: y^coSaMP,{T'^)o = and i^CoSaMP^T^ = {^p)t^- 

In the analysis of the CoSaMP that comes next, we follow the same steps as for the SP to derive a 
near-oracle performance guarantee. Since the proofs are very similar to those of the SP, and those found 
in |fT3l . we omit most of the derivations and present only the differences. 
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Theorem 3.4: The CoSaMP solution at the £-th iteration satisfies the recurrence inequalit; 

4(54X __ _-_£_! 



I 



^- ^CoSaMP ^< (l_5^^)2 

14 - 654K 



+ - 



1 - S4kY 



^ ^CoSaMP 

II -'e 112 



For 64K < 0.1 this leads to 



(III. 12) 



X — X, 



CoSaMP 



<0.5 



X — X 



-/-l 



CoSaMP 



+ 16.6 IID^. ell 



(III. 13) 



Proof: The proof of the inequality in (IIII.12I ) is given in Appendix |D] In a similar way to the proof in 



the SP case, under the condition 64K < 0.1, it holds that the coefficient multiplying 



X — X 



-1 

CoSaMP 



is smaller or equal to 0.5, while the coefficient multiplying ||D^ e|| is smaller or equal to 16.6, which 



completes our proof. 



D 



Corollary 3.5: Under the condition S^k < 0.1, the CoSaMP algorithm satisfies 



X — X 



CoSaMP 



< 2"^ ||x|L + 2 -16.6110:^6 



Te^ll2- 



(III. 14) 



In addition. After at most 



logs 



X 



I -t e I I Z ^ 



iterations, the solution ^coSaMP leads to an accuracy 



(III. 15) 



x-x 



CoSaMP 



< CcoSaMP D^ e , 



(HI. 16) 



where 

CcoSaMP = '' ~!''T V'"" ^ 34.1. (III.17) 

(1 -04Kr 

Proof: Starting with ( IIII.13I ). and applying it recursively, in the same way as was done in the proof of 

*The observant reader will notice a delicate difference in terminology between this theorem and Theorem 13. II While here the 
recurrence formula is expressed with respect to the estimation error, ||x — x^oSaA/pIL' Theorem 13. II uses a slightly different 
error measure, ||xj,_j,<?|| . 
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Corollary 13.51 we obtain 



< 0.5*^ 



-^ ^CoSaMP 



(III. 18) 



+16.6- |^0.5M||D^^e||2. 

Setting k = i leads easily to (IIII.14I ). since ||x — x^o^ajv/pIL ~ Il-'^ll2- 
Plugging the number of iterations £* as in (IIII.15I ) to (IIII.14I ) yield^ 



X — X 



CoSaMP 



<2-^- 11x11 I 2 ^^-^^^K IIP* ell 

2- "''"2 + ^ (i_5^^)2l|iJT.e||2 

- (l-54i,)2 Il°^^^ll2- 

Applying the condition (54i<' < 0.1 on this equation leads to the result in (IIII.16I ). D 

As for the SP, we move now to the random noise case, which leads to a near-oracle performance 
guarantee for the CoSaMP algorithm. 

Theorem 3.6: Assume that e is a white Gaussian noise vector with variance a'^ and that the columns of 



D are normalized. If the condition S^x < 0.1 holds, then with probability exceeding 1— (\/vr(l + a) logN- 
N"-y^ we obtain 

||x - ^coSaMpWl < ChoSaMP ' (2(1 + a) log N) ■ Ka\ (III.19) 

Proof: The proof is identical to the one of Theorem 13.61 D 

Fig. |2] shows a graph of CcoSaMP as a function of S^k- In order to compare the CoSaMP to SP, we 
also introduce in this figure a graph of Csp versus 6iK (replacing S^k)- Since 63K < ^ak, the constant 
Csp is actually better than the values shown in the graph, and yet, it can be seen that even in this case 
we get Csp < CcoSaMP- In addition, the requirement for the SP is expressed with respect to S^k, while 
the requirement for the CoSaMP is stronger and uses S^k- 

With comparison to the results presented in | |2T] | for the OMP and the thresholding, the results obtained 
for the CoSaMP and SP are uniform, expressed only with respect to the properties of the dictionary D. 
These algorithms' validity is not dependent on the values of the input vector x, its energy, or the noise 

'As before, we replace the constant 16.6 with the equivalent expression that depends on S4K ~ see i |III.12t . 
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Fig. 2. The constants of the SP and CoSaMP algorithms as a funtion of 54k 

power. The condition used is the RIP, which implies constraints only on the used dictionary and the 
sparsity level. 

C. Near oracle performance of the IHT algorithm 

The IHT algorithm, presented in Algorithm [3l uses a different strategy than the SP and the CoSaMP. It 
applies only multiplications by D and D*, and a hard thresholding operator. In each iteration it calculates 
a new representation and keeps its K largest elements. As for the SP and CoSaMP, here as well we employ 
a fixed number of iterations as a stopping criterion. 

Algorithm 3 IHT Algorithm [Equation 7 in pl51] 

Input: K, D, y where y = Dx + e, K is the cardinality of x and e is the additive noise. 
Output: ^iHT- i^-sparse approximation of x 
Initialize the support: T^ = 0. 
Initialize the representation: ^^jjj- = 0. 
while halting criterion is not satisfied do 

Compute the representation: Xp = x^^^^^ + D*(y — 'DicjJ^rp). 
Prune small entries in the representation: T^ = supp(xp,ir). 
Update the representation: x^^^.^^^^ = and ii.^jj^rprpi = (xp)^-^. 
end while 
Form the final solution: ii.jjjrpjrpfjc = and iiHT,T'' = {^p)t''- 

Similar results, as of the SP and CoSaMP methods, can be sought for the IHT method. Again, the 
proofs are very similar to the ones shown before for the SP and the CoSaMP and thus only the differences 
will presented. 
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Theorem 3.7: The IHT solution at the ^-th iteration satisfies the recurrence inequality 



X — X 



IHT 



<V86: 



3K 



X — X 



-1 
IHT 



+ 4||D5.e|L, 



(III.20) 



For 63K < -4^ this leads to 



X — X 



IHT 



< 0.5 



X — X 



-e-1 

IHT 



+ 4 D:^e 



Te*^ll2 



(III.21) 



Proof: Our proof is based on the proof of Theorem 5 in |15|, and only major modifications in the 
proof will be presented here. Using the definition r^ = x — 'x.\^rp, and an inequality taken from Equation 
(22) in m, it holds that 

(in.22) 



X - rx^iHT 


^ < 2 Xj-ur^ - (Xp)tut* 2 




= 2 


yirpyjrpi — iCrp^jj,/, " D^^/p^Dr " 


'Drp^jnpee 


= 2 


e-1 Y)* rjr^-i n* p 


2' 



where the equality emerges from the definition Xp = Xjj^j, + D* (y — Dx^^l^) = x^^^^. + D* (Dx + e 



Dx 



IHT) 



The support of r^^^ is over T U T^^^ and thus it is also over T U T^ U T^^^. Based on this, we can 
divide Dr^~^ into a part supported on T^^i — X^uT and a second part supported on T^ U T. Using this 
and the triangle inequality with (IIII.22b . we obtain 



X — :>i.jHT 

e-1 



(III.23) 



<2 

= 2 



±Jrp, ,rn£±jY 



(I — 'Drpyjrpe'Drpyjrpejrrp^rpl 
-L)rp, ,rplL) 



' II TUT^ II2 



TUT'^^T'-^-TUT'^^T'^-^-TUT'^ 



+ 2||D, 



TUT^'^lb 



< 2 

+ 2 

<262K 



(I — Y)rpyjrpe'Drpijrpi)rj,^j,e 



I-^ruT* Dt*^-! -TUT* T^T'-^-TUT'^ 



+ 2||d:^ ae 



-TUT* 



+ 2,5: 



3K 



J-1 



Te,2«|l2 



+ 4 D^eL 

II -'e 112 



The last inequality holds because the eigenvalues of (I — D^^jn^D-ruTf) are in the range [—S2k,S2k], 
the size of the set T U T^ is smaller than 2K, the sets T U T^ and T^-^ - T U T^ are disjoint, and of 
total size of these together is equal or smaller than 3K. Note that we have used the definition of Te,2 as 
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given in Section JI] 

We proceed by observing that r^J^.i ,j,^j,f + 
orthogonal. Using the fact that 52k < ^'iK we get ( IIII.20I ) from ( IIII.23I ). Finally, under the condition 



'TUT« 



< \/2||r^ ^IL, since these vectors are 



^ZK < l/v32, it holds that the coefficient multiplying 
completes our proof. 



X — X 



-1 

IHT 



is smaller or equal to 0.5, which 



D 



Corollary 3.8: Under the condition 6sk < 1/v 32, the IHT algorithm satisfies 



X — :x.iHT 



<2"M|x|L + 8||D:^e|L. 

" "^ II -^e 11^ 



In addition. After at most 



iterations, the solution ^iht leads to an accuracy 




where 



X - Xjjjj' < ClHT W^tM 



ClHT = 9. 



Proof: The proof is obtained following the same steps as in Corollaries 13.21 and 



(III.24) 



(ni.25) 



(III. 26) 

(III.27) 
D 



Finally, considering a random noise instead of an adversial one, we get a near-oracle performance 
guarantee for the IHT algorithm, as was achieved for the SP and CoSaMP. 

Theorem 3.9: Assume that e is a white Gaussian noise with variance ct^ and that the columns of D are 
normalized. If the condition 63K < 1/^/32 holds, then with probability exceeding 1 — (^7r(l + a) log A^- 
AT")-! we obtain 



|x - x/htIIs < C]ht ■ (2(1 + a) logN) ■ Ka^. 



Proof: The proof is identical to the one of Theorem 



(III.28) 



D 



A comparison between the constants achieved by the IHT, SP and DS is presented in Fig. [3] The 
CoSaMP constant was omitted since it is bigger than the one of the SP and it is dependent on 64K 
instead of 6sk- The figure shows that the constant values of IHT and DS are better than that of the 
SP (and as such better than the one of the CoSaMP), and that the one of the DS is the smallest. It is 
interesting to note that the constant of the IHT is independent of d^K- 
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Fig. 3. The constants of the SP, IHT and DS algorithms as a funtion of 5s 



Alg. 


RIP Condition 


Probability of Correctness 


Constant 


The Obtained Bound 


DS 


S2K + S3K < 1 


l-(v/7r(l + a)logiV-iV'')-^ 


4 


Cl s ■ {2{1 + a) log N)-Kcr^ 


l-2fe,c 


BP 


S2K + 3S3K < 1 


l_(iV")-^ 


> 7;^ 


Cf, p ■ (2(1 + a) log N)-K a' 


SP 


SsK < 0.139 


l-(^7r(l + a)logiV-iV'^)-^ 


< 21.41 


C^p ■{2{l + a)log N)-Ka^ 


CoSaMP 


SiK < 0.1 


I ~ {^ TT {1 + a) log N-N")-^ 


<34.2 


CloSaMp-m + a)logN)-Ka^ 


IHT 


^3A- < TM 


l-{^T:{l + a)logN-N'')-' 


9 


C|^j,-(2(l + a)logAr)-i^a2 



TABLE I 

Near oracle performance guarantees for the DS, BP, SP, CoSaMP and IHT techniques. 



In table |I] we summarize the performance guarantees of several different algorithms - the DS L8J, the 
BP II29I . and the three algorithms analyzed in this paper. 
We can observe the following: 

1) In terms of the RIP: DS and BP are the best, then IHT, then SP and last is CoSaMP 

2) In terms of the constants in the bounds: the smallest constant is achieved by DS. Then come IHT, 
SP CoSaMP and BP in this order. 

3) In terms of the probability: all have the same probability except the BP which gives a weaker 
guarantee. 

4) Though the CoSaMP has a weaker guarantee compared to the SP, it has an efficient implementation 
that saves the matrix inversion in the algorithm]^ 



The proofs of the guarantees in this paper are not valid for this case, though it is not hard to extend them for it. 
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For completeness of the discussion here, we also refer to algorithms' complexity: the IHT is the 
cheapest, CoSaMP and SP come next with a similar complexity (with a slight advantage to CoSaMP). 
DS and BP seem to be the most complex. 

Interestingly, in the guarantees of the OMP and the thresholding in [211 better constants are obtained. 
However, these results, as mentioned before, holds under mutual-coherence based conditions, which are 
more restricting. In addition, their validity relies on the magnitude of the entries of x and the noise power, 
which is not correct for the results presented in this section for the greedy-like methods. Furthermore, 
though we get bigger constants with these methods, the conditions are not tight, as will be seen in the 
next section. 

IV. Experiments 

In our experiments we use a random dictionary with entries drawn from the canonic normal distribution. 
The columns of the dictionary are normalized and the dimensions are m = 512 and N = 1024. The 
vector X is generated by selecting a support uniformly at random. Then the elements in the support are 
generated using the following modelf|: 

Xi = 10ei(l + |ni|) (IV.l) 

where e^ is ±1 with probability 0.5, and rij is a canonic normal random variable. The support and the 
non-zero values are statistically independent. We repeat each experiment 1500 times. 

In the first experiment we calculate the error of the SP, CoSaMP and IHT methods for different sparsity 
levels. The noise variance is set to cr = 1. Fig. |4]presents the squared-error ||x — xHg of all the instances 
of the experiment for the three algorithms. Our goal is to show that with high-probability the error 
obtained is bounded by the guarantees we have developed. For each algorithm we add the theoretical 
guarantee and the oracle performance. As can be seen, the theoretical guarantees are too loose and the 
actual performance of the algorithms is much better. However, we see that both the theoretical and the 
empirical performance curves show a proportionality to the oracle error. Note that the actual performance 
of the algorithms' may be better than the oracle's - this happens because the oracle is the Maximum- 
Likelihood Estimator (MLE) in this case ll3ll . and by adding a bias one can perform even better in some 
cases. 

'This model is taken from tlie experiments section in fSl. 
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(c) IHT method 

Fig. 4. The squared-error as achieved by the SP, the CoSaMP and the IHT algorithms as a function of the cardinality. The 
graphs also show the theoretical guarantees and the oracle performance. 



Fig. |5(a)| presents the mean-squared-eiror (by averaging all the experiments) for the range where the 
RIP-condition seems to hold, and Fig. |5(b)| presents this error for a wider range, where it is likely top 
be violated. It can be seen that in the average case, though the algorithms get different constants in their 
bounds, they achieve almost the same performance. We also see a near-linear curve describing the error 
as a function of K. Finally, we observe that the SP and the CoSaMP, which were shown to have worse 
constants in theory, have better performance and are more stable in the case where the RIP-condition do 
not hold anymore. 

In a second experiment we calculate the error of the SP, the CoSaMP and the IHT methods for 
different noise variances. The sparsity is set to K = 10. Fig. [6] presents the error of all the instances of 
the experiment for the three algorithms. Here as well we add the theoretical guarantee and the oracle 
performance. As we saw before, the guarantee is not tight but the error is proportional to the oracle 
estimator's error. 

Fig. H presents the mean-squared-error as a function of the noise variance, by averaging over all the 
experiments. It can be seen that the error behaves linearly with respect to the variance, as expected from 
the theoretical analysis. Again we see that the constants are not tight and that the algorithms behave 
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Fig. 5. The mean-squared-error of the SP, the CoSaMP and the IHT algorithms as a function of the cardinality. 
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(c) IHT method 

Fig. 6. The squared-error as achieved by the SP, the CoSaMP and the IHT algorithms as a function of the noise variance. The 
graphs also show the theoretical guarantees and the oracle performance. 



in a similar way. Finally, we note that the algorithms succeed in meeting the bounds even in very low 
signal-to-noise ratios, where simple greedy algorithms are expected to fail. 

V. Extension to the non-exact sparse case 

In the case where x is not exactly i^-sparse, our analysis has to change. Following the work reported 
in |[T3]| . we have the following error bounds for all algorithms (with the different RIP condition and 
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Fig. 7. The mean-squared-error of the SP, the CoSaMP and the IHT algorithms as a function of the noise variance. 



constant). 

Theorem 5.1: For the SP, CoSaMP and IHT algorithms, under their appropriate RIP conditions, it 
holds that after at most 



log2 



I -t e I I Z ^ 



(V.l) 



iterations, the estimation x gives an accuracy of the form 



Ix-icL <C{ D^^eL 



(V.2) 



I n I X Ml II , 1 + '^-^ II II 

+ (1 + dK) ||X - XXII2 H y=^ ||X - Xxlli 



K 



where x/^ is a ii'-sparse vector that nulls all entries in x apart from the K dominant ones. C is the 
appropriate constant value, dependent on the algorithm. 

If we assume that e is a white Gaussian noise with variance a'^ and that the columns of D are 
normalized, then with probability exceeding 1 — (^7r(l + a) log A^ • A^'')^^ we get that 



X-XII2 <2-C^(^{l + a)logN-K-a 



(V.3) 



x-xj^ L + 



K 



X-Xii- 



Proof: Proposition 3.5 from [13] provides us with the following claim 



|Dx||2 < vT+T^ M|x||2 + -j= ||x|| J 



(V.4) 
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When X is not exactly i^-sparse we get that the effective error in our results becomes e = e + D(x — xj^). 
Thus, using the error bounds of the algorithms with the inequality in (IV.4b we get 



Ix — xlL < C D^ e L 

' ' ' ^ M -'^ e \\ Z 



(V.5) 



< C||D^Je + D(x-xx))||2 

< C||D^e||+C||D^D(x-xx)|L 



< C||D^^e||+CVr+5^||D(x-x^)||2 

< CM|D^^e||2 + (l + 5x)||x-xx||2 

\\^-^k\\i 



which proves (IV.2I ). Using the same steps taken in Theorems 13.31 13.61 and 13.91 lead us to 

||x-i||2 < C^(Vi2{l + a)\ogN)-K-a 



(V.6) 



+ (1 + 5k) ||x - Xi^||2 + 



1 + Sk 



K 



x-xx 



Since the RIP condition for all the algorithms satisfies 5k < \/2 — 1, plugging this into (I V.6 
and this concludes the proof. 



gives (IV.3I ). 

D 



Just as before, we should wonder how close is this bound to the one obtained by an oracle that knows 
the support T of the K dominant entries in x. Following |[T5l . we derive an expression for such an oracle. 
Using the fact that the oracle is given by D^^y = Dj,(Dxt + D(x — xy) + e), its MSB is bounded by 



E IIX - Xnr, 



■oracle\\2 



E 



x-D^y 



(V.7) 



E 



X — XT — Dj,e — Dj,D(x — xy 



< 



(II. 



XT + 



D:^D(x - xt) 



2 

+ E 



Y)\,e 



where we have used the triangle inequality. Using the relation given in (11.101 ) for the last term, and 
properties of the RIP for the second, we obtain 



£/ ||X ^orac/e II 2 — 



(V.8) 



||x - XTII2 + 



1 n^, M, ^K 



May 26, 2010 



DRAFT 



PREPRINT 25 



Finally, the middle-term can be further handled using (IV.4b . and we arrive to 

E ||X - ^oracleWl < ^3^ ((^ + ^1 + ^k) ||x - ^T^ (V.9) 



H ^ — ||x - Xft-y^ + VKa 



2 



Thus we see again that the error bound in the non-exact sparse case, is up to a constant and the log A^ 
factor the same as the one of the oracle estimator. 

VI. Conclusion 

In this paper we have presented near-oracle performance guarantees for three greedy-like algorithms 
- the Subspace Pursuit, the CoSaMP, and the Iterative Hard-Thresholding. The approach taken in our 
analysis is an RIP-based (as opposed to mutual-coherence ones). Despite their resemblance to greedy 
algorithms, such as the OMP and the thresholding, our study leads to uniform guarantees for the three 
algorithms explored, i.e., the near-oracle error bounds are dependent only on the dictionary properties 
(RIP constant) and the sparsity level of the sought solution. We have also presented a simple extension 
of our results to the case where the representations are only approximately sparse. 
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Appendix A 
Proof of Theorem |3.1I - inequality (IIII.ll ) 

In the proof of (IIII.ll) we use two main inequalities: 

II II ^ ^"^sft- II II .. .. 

I IIt^* « II 



and 



I II ^ l + SsK II II .. ^. 

|XT_T*||2 < Y3j II^T-ff|l2 (^-2) 
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Their proofs are in Appendices |B] and O respectively. The inequaUty (IIII.ll) is obtained by substituting 
(lA.ll) into (IA.2I) as shown below: 

II II ^ 1 + <J3ii' II II I '^ IIt-»* II /A -i\ 

1 - d?.K II -* -f "2 1 _ A „ 11/ 



^ 1 + (^37^ 



3i^ - i — 03/f 

2(53i^ 



1 -'^3^- 

2 

(1 -~5m) 



l_5,..^2ll^T-T-^ll2 



^sk; 



+ 7^— ^llD^^ell, 



1 -b^K ^ 



+ 7i — A — ^3 F^TeS 2 + -^ — 7— DreS 2 

^ 2<53i^(l + 53/f) 

6-6^3x + 4J|^ ,, 11 

and this concludes this proof. D 

Appendix B 
Proof of inequality (IA.1I) 

Lemma B.l: The following inequality holds true for the SP algorithm: 

II II ^ ^"^3/^ II II 

II^T-f^ll2 ^ (l_53^)2ll^T-T-^|l2 

I IIt-»* „II 

Proof: We start by the residual-update step in the SP algorithm, and exploit the relation y = Dx + e = 

D7^_7^f-ixj'_7^«-i + D^^piT^f-iXj'nj'*-! + e. This leads to 

y^i=resid(y,DT.-i) (B.l) 

= resid(Dj'_5"«-iX2^_7^f-i,Dj'«-i) 

+ resid(D7^nT«-iX2-n7-«-i,DT«-i) + resid(e, Dt^^-i). 

Here we have used the linearity of the operator resid(-, Dy«-i) with respect to its first entry. The second 
term in the right-hand-side (rhs) is since Y)xr\T'-^'>^Tr\T'^-^ G span(Dj'«-i). For the first term in the 
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rhs we have 



= 'Dj'_j'i-ix.j'—T'-^ ~ proj(DT_T*-ixj'_7^«-i, Dji 



[IJ2^_'P*-i , Dj^f 






where we have defined 



(B.2) 



Xpji^-i — — (Djif-iD^f-i j Jjrpi_i\J'j^__'j'e-i'K'jp_'j'e 



(B.3) 



Combining (IB. lb and (IB.2I ) leads to 



y^ = Dj^uj-c-ix^ + resid(e,D7-f-i' 



(B.4) 



By the definition of Ta in Algorithm [T] we obtain 






> 






> 



D 



T-T^-iYr 



> 



> 



^-1 



D ji_ jif - 1 Dt^li j'«- 1 X, 



Dj'-T*-! resid(e,D7^«-i' 



£-1 



D/p_^(>-iD2"UT^-iX' 



|D^„j.f_i proj(e,Dj'«-i)| 



D* 



We will bound IID^ ,p^_i proj(e,Dr«-i)|| from above using RIP properties from Section Ull 



(B.5) 



|D^_2.<!-i proj(e,DT«-i)||2 
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Combining (IB.5I) and (IB.6I) leads to 



Dkyf"^ 



> 



> 



— IIT)* pII 



±Jrp_rp^_l L) 



Drp nn£_l JJ^^IJ^^^ — 1 X, 

S3K 



l-l 



(B.7) 



3-^ „D^.-.e||, 



rp_rpe_llJrpyjrpi-lX^ 



1 






2 1 - (53/^ 

By the definition of Ta and yfr^ it holds that Ta n T^"^ = since D^^.^iyfr^ = 0. Using dBlT) . the 
left-hand-side (Ihs) of (IB.7I) is upper bounded by 






< 
< 

< 
< 



Dy. Dru^f-ix;. 



J-i 



d:5..d 



TA-'-'TUTf-iX.r 



,^-1 



+ ||D^^ resid(e,Dr<! 
+ l|DT.e||2 



+ Dj^ Dj'«-l (D/Tn£-lD'p«-l j D/Tif-lG 






+ D:^ e 



rA«ll2 



+ 



1 -SsK 



DS^ D 



Ta-'-'TUT^-iX, 



1 



+ -. ^ Pre L. 

2 I-63K " ^ "2 



Combining (IB.7I ) and (IB.81 ) gives 



DtaDtut^-ix: 
> 



i-1 



2 1 - ^SA- 
1 



D* ^ I I 



Dji_jif_iDj'uT«-ix! 
Removing the common rows in D^ and D^ ,p<._i we get 



D^^.j.D'TuT-i-ix^ 



> 



D 



T-T*-i-TA-^ruT*-iX 



D 



2 1 - (53i^ 

1 



iDrel 

-'^ e 



D^_^,D7^UT*-iX^ 

The last equality is true because T - T^"^ - Ta = T - T^"^ - (T^ - T'^-'^] 



rp rpfi 



(B.8) 



(B.9) 



(B.IO) 
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Now we turn to bound the Ihs and rhs terms of (IB.IOI ) from below and above, respectively. For the 
Ihs term we exploit the fact that the supports Ta — T and T U T^^^ are disjoint, leading to 



d:; 



Ta-tDtut'^-ix: 



e-1 



<Si 



|TaUT«-iuT| 



X 



e-1 



(B.ll) 



< S3K 



x!-i 



For the rhs term in dB.lOl) . we obtain 



J-i 






> 
- D* D 

>{1-6^k) (xf,-^)T_f 



(B.12) 



T_j'«-^(TUT*-i)-(T-T*)vXr )(Tur*-i)-(T-f'*) 



-1 



03K 



03K 



x^ 



X 



i-1 



Substitution of the two bounds derived above into (IB.IOI ) gives 



2(53A- 



X 



D* ^ 



2 l-5-iK " ^ "^ 

> (1 -5^k) (Xj.~ )rp_fl 



(B.13) 



The above inequality uses xf ^, which was defined in (IB. 21 ). and this definition relies on yet another one 
definition for the vector Xp^-^i in (IB. 31 ). We proceed by bounding 11x^^^*1112 ^o™ above. 



Xp,T«-i 



(B.14) 
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-1t\* 
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\x.j^_j^i-i II2 
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^3i^ 



1 - 63K 



|Xy_2"*^i II2 ' 



and get 



X 



l-i 



^ ||X'p_'p*-i II2 + X„ 2^4 



(B.15) 



< 1 + 



63K 



1 - 63K 



|x^_2i«-i II2 
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1 - 63K 



|x'p_'pf-i II2 
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In addition, since (xf ^)x-t'~'^ = ^t-T'-^ then (xf ^)x-f^ — ^^-f'^- Using this fact and (IB.15I) with 
(|BJ3]) leads to 



(1 - 63Kr 

which proves the inequaUty in dA.ll ). 



^T-Tn\2 

^ 2'^3A" II II , 2 II II 



(1 - hK)'' 



(B.16) 



□ 



Appendix C 
Proof of inequality (IA.2I) 

Lemma C.l: The following inequality holds true for the SP algorithm: 



|x2^_2if||2 ^ 



l + <53i^ 



+ 



D' * ^ I I 

I -te 112 



Proof: We will define the smear vector e = Xp — Xj,^ , where Xp is the outcome of the representation 
computation over T^, given by 



Xp = Dt^y = Dt^(DTXT + e), 
as defined in Algorithm [T] Expanding the first term in the last equality gives: 



(C.l) 



-L'^j-UtXt — ^j'tTnT^'^TnT'^ ' ^fftT—T'^'^T—T^ 



f« V^Tr\f^ ' -^T 



T'^-rJ 







~r ^rp^^'j'_'j'i^'j'_'j'(i 






(C.2) 



The equalities hold based on the definition of Dt and on the fact that x is outside of T. Using (IC.2I ) 
we bound the smear energy from above, obtaining 



H2 < 



D^.Dtxt 



+ 






(C.3) 



^J_J^^ J-Jj^^ J ^rY'i^rp_rpi'X.rp_fj^l 
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We now turn to bound ||e||2 from below. We denote the support of the K smallest coefficients in Xp 
by AT = f^-T^. Thus, for any set T' C f^ of cardinality K, it holds that ||(xp)Ar||2 < ll(xp)T'|l2- I" 
particular, we shall choose T' such that T' r\T = %, which necessarily exists because T^ is of cardinality 
2K and therefore there must be at K entries in this support that are outside T. Thus, using the relation 

e = Xp — Xj,f we get 



(Xp)at||2 < ll(Xp)T' 



'^fl )rp, + £T' 



(C.4) 



eT'|l2 < ll^l 



Because x is supported on T we have that ||xat||2 = ||xATnT||2- An upper bound for this vector is 
reached by 



||xArnT|l2 — ll(xp)ATnT - eATnT|l2 

< ||(Xp)ATnT|l2 + l|eArnT|l2 

< ||(xp)AT|l2 + l|e|l2 < 2||e||2, 

where the last step uses ( IC.4I ). The vector xt_t« can be decomposed as xt_t« 



(C.5) 



^TnAT'^r_f(! 



Using (IC3I ) and (IC3I ) we get 



|XT_T«||2 < ||xTnAT|l2 + ||x2'_'ff II2 ^ 2 ||e||2 + ||Xy_y^||2 



< 1 + 



1 - SsK 

1 + S3K 



X, 



+ 



"^ l-d3K 

4 



D*- e 



Xr, 



+ 



D* ^ w 
, tM\2 



where the last step uses the property 
the proof. 



< 2 I Ids. elL taken from Section [III and this concludes 



D* e 



D 



Appendix D 
Proof of inequality (IIII.12I) 

Lemma D.l: The following inequality holds true for the CoSaMP algorithm: 



X — ^CoSaMP 



< 



;i - s^Kr 



-e-i 

^CoSaMP 






Proof: We denote ^coSaMP ^^ '■'^^ solution of CoSaMP in the £-th iteration: ^coSaMP(T<^)<^ ~ ^ ^'^'^ 
hoSaMPT'^ = (xp)j'«. We further define r^ = x — y^coSaMP ^^'^ ^^^ ^^^ definition of Te^p (Section HIl). 
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Our proof is based on the proof of Theorem 4.1 and the Lemmas used with it |[T3l . 

Since we choose Ta to contain the biggest 2K elements in D*y^ and |T^~^ U T] < 2K it holds true 
that ||(D*y^)7-fur||2 < ||(D*yf)r^ H^. Removing the common elements from both sides we get 



(D*y,,)(T''ur)-TA < (D*yr)rA-(T<!uT) 



\*,J'\ 



(D.l) 



We proceed by bounding the rhs and Ihs of (ID. II ). from above and from below respectively, using the 
triangle inequality We use Propositions 12.11 and 12.21 the definition of Te,2, and the fact that ||r^|L = 



'r*UTll2 



(this holds true since the support of r^ is over T U T^). For the rhs we obtain 



(D*yr)rA-(T''ur) 
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Ta-{T*UT) 



(D/ + e) 
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-'-^Ta - (T«UT) DT^UTr^fuT 
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rA-(T'UT)* 



< hx 
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Ti*rp e 



and for the Ihs: 



(D.2) 



(D*yf) 



(T«UT)-Ta 



D 



(T«UT)-Ta 



^ fD/ + e^ 



> 



D(rfur)-TA^(r*UT)-TAr(T<?UT)-TA 



D(T«uT)-Ta ^^a I^Ta 



I-^(T«UT)-Ta^ 



> (1-52X) 



■(T«UT)-Ta 



'^4X 



d:^ e 

-'e.2 



Because r^ is supported over T U T^, it holds true that 
( ID.2I ) with dP.ll ). gives 






(TUT'!)-Ta 



DS. e 

-16.2 



■TC 



(D.3) 



< 



< 



1 - ^2K 

254i^||r^||2+4||D5.^e| 

1 - 6iK 



Combining ( ID.3I ) and 



(D.4) 



May 26, 2010 



DRAFT 



PREPRINT 



33 



For brevity of notations, we denote hereafter T as T. Using y = Dx + e = Dj,Xj, + 'Dfc^fc + e, 
we observe that 



I ^T \^P } T I 



X^^ U - yVJjy'X.jy -\- Urp(^Xrp(^ -\- GJ 



Dt (pfc^fc + e) 



< 



< 



< 



(D;,D^)-^D^D^cX^c + (D^D^)-^D 



-l-Tk* , 



1 



1 - SsK 



^rp^'J'C ^'f'C 



Y - -I- 



1 



2^1-5. 



?,K 



D*r, e 



'T. 






(D.5) 



1 - 5ak ^^ 1 - 5iK 

where the last inequality holds true because of Proposition 12.41 and that \T\ = "iK. Using the triangle 
inequality and the fact that Xp is supported on T, we obtain 



1"^ "^^112 — ll'^T'^ II 2 ll'^T \^P)T II 2 ' 



(D.6) 



which leads to 



|x - Xpllg < ( 1 + 



8iK \ w II 3 II II 

1 -Oak J '^ 1 -OiK ■^ 



1 1 II 3 11^, I 



Having the above results we can obtain (IIII.12b by 
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71 A \2 Pr.*^ 2' 

(1 - dA^Kr 



where the inequahties are based on Lemma 4.5 from \\3}, (ID.7I ). Lemma 4.3 from |[T3l and (ID. 41) 
respectively. D 
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